GWLR - Osun Water Points

Published

December 17, 2022

case study : Modelling the Spatial Variation of the Explanatory Factors of Water Point Status using Geographically Weighted Logistic Regression (GWLR).

1. OVERVIEW

This study focuses on GWLR analysis based on Nigeria’s water points attributes.

1.1 Objectives

To build an explanatory model to discover factor affecting water point status in Osun State, Nigeria :

Study area : Osun State, Nigeria


2. R PACKAGE REQUIRED

The following are the packages required for this exercise :

2.1 Load R Packages into R Environment

Usage of the code chunk below :

p_load( ) - pacman - to load packages. This function will attempt to install the package from CRAN or pacman repository list if its found not installed.

pacman::p_load(sf, tidyverse, funModeling, blorr, corrplot, ggpubr, spdep, GWmodel, tmap, skimr, caret, questionr, berryFunctions)


3. GEOSPATIAL DATA

3.1 Acquire Data Source

  • Aspatial Data

    • Osun_wp_sf.rds, contained water points within Osun state.

      • It is in sf point data frame.
  • Geospatial Data

    • Osun.rds, contains LGAs boundaries of Osun State.

      • It is in sf polygon data frame

3.2 Import Data

3.2.1 Import Boundary RDS File

bdy_osun <- read_rds("data/geodata/Osun.rds")

3.2.1.1 review imported data

skim(bdy_osun)
Warning: Couldn't find skimmers for class: sfc_MULTIPOLYGON, sfc; No user-
defined `sfl` provided. Falling back to `character`.
Data summary
Name bdy_osun
Number of rows 30
Number of columns 5
_______________________
Column type frequency:
character 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ADM2_EN 0 1 3 14 0 30 0
ADM2_PCODE 0 1 8 8 0 30 0
ADM1_EN 0 1 4 4 0 1 0
ADM1_PCODE 0 1 5 5 0 1 0
geometry 0 1 1805 7898 0 30 0
freq(duplicated(bdy_osun$ADM2_EN))
       n   % val%
FALSE 30 100  100
freq(bdy_osun$ADM1_EN)
      n   % val%
Osun 30 100  100

3.2.2 Import Attribute RDS

wp_osun <- read_rds("data/geodata/Osun_wp_sf.rds")

3.2.2.1 review imported data

skim(wp_osun)
Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined `sfl`
provided. Falling back to `character`.
Data summary
Name wp_osun
Number of rows 4760
Number of columns 75
_______________________
Column type frequency:
character 47
logical 5
numeric 23
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
source 0 1.00 5 44 0 2 0
report_date 0 1.00 22 22 0 42 0
status_id 0 1.00 2 7 0 3 0
water_source_clean 0 1.00 8 22 0 3 0
water_source_category 0 1.00 4 6 0 2 0
water_tech_clean 24 0.99 9 23 0 3 0
water_tech_category 24 0.99 9 15 0 2 0
facility_type 0 1.00 8 8 0 1 0
clean_country_name 0 1.00 7 7 0 1 0
clean_adm1 0 1.00 3 5 0 5 0
clean_adm2 0 1.00 3 14 0 35 0
clean_adm3 4760 0.00 NA NA 0 0 0
clean_adm4 4760 0.00 NA NA 0 0 0
installer 4760 0.00 NA NA 0 0 0
management_clean 1573 0.67 5 37 0 7 0
status_clean 0 1.00 9 32 0 7 0
pay 0 1.00 2 39 0 7 0
fecal_coliform_presence 4760 0.00 NA NA 0 0 0
subjective_quality 0 1.00 18 20 0 4 0
activity_id 4757 0.00 36 36 0 3 0
scheme_id 4760 0.00 NA NA 0 0 0
wpdx_id 0 1.00 12 12 0 4760 0
notes 0 1.00 2 96 0 3502 0
orig_lnk 4757 0.00 84 84 0 1 0
photo_lnk 41 0.99 84 84 0 4719 0
country_id 0 1.00 2 2 0 1 0
data_lnk 0 1.00 79 96 0 2 0
water_point_history 0 1.00 142 834 0 4750 0
clean_country_id 0 1.00 3 3 0 1 0
country_name 0 1.00 7 7 0 1 0
water_source 0 1.00 8 30 0 4 0
water_tech 0 1.00 5 37 0 20 0
adm2 0 1.00 3 14 0 33 0
adm3 4760 0.00 NA NA 0 0 0
management 1573 0.67 5 47 0 7 0
adm1 0 1.00 4 5 0 4 0
New Georeferenced Column 0 1.00 16 35 0 4760 0
lat_lon_deg 0 1.00 13 32 0 4760 0
public_data_source 0 1.00 84 102 0 2 0
converted 0 1.00 53 53 0 1 0
created_timestamp 0 1.00 22 22 0 2 0
updated_timestamp 0 1.00 22 22 0 2 0
Geometry 0 1.00 33 37 0 4760 0
ADM2_EN 0 1.00 3 14 0 30 0
ADM2_PCODE 0 1.00 8 8 0 30 0
ADM1_EN 0 1.00 4 4 0 1 0
ADM1_PCODE 0 1.00 5 5 0 1 0

Variable type: logical

skim_variable n_missing complete_rate mean count
rehab_year 4760 0 NaN :
rehabilitator 4760 0 NaN :
is_urban 0 1 0.39 FAL: 2884, TRU: 1876
latest_record 0 1 1.00 TRU: 4760
status 0 1 0.56 TRU: 2642, FAL: 2118

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
row_id 0 1.00 68550.48 10216.94 49601.00 66874.75 68244.50 69562.25 471319.00 ▇▁▁▁▁
lat_deg 0 1.00 7.68 0.22 7.06 7.51 7.71 7.88 8.06 ▁▂▇▇▇
lon_deg 0 1.00 4.54 0.21 4.08 4.36 4.56 4.71 5.06 ▃▆▇▇▂
install_year 1144 0.76 2008.63 6.04 1917.00 2006.00 2010.00 2013.00 2015.00 ▁▁▁▁▇
fecal_coliform_value 4760 0.00 NaN NA NA NA NA NA NA
distance_to_primary_road 0 1.00 5021.53 5648.34 0.01 719.36 2972.78 7314.73 26909.86 ▇▂▁▁▁
distance_to_secondary_road 0 1.00 3750.47 3938.63 0.15 460.90 2554.25 5791.94 19559.48 ▇▃▁▁▁
distance_to_tertiary_road 0 1.00 1259.28 1680.04 0.02 121.25 521.77 1834.42 10966.27 ▇▂▁▁▁
distance_to_city 0 1.00 16663.99 10960.82 53.05 7930.75 15030.41 24255.75 47934.34 ▇▇▆▃▁
distance_to_town 0 1.00 16726.59 12452.65 30.00 6876.92 12204.53 27739.46 44020.64 ▇▅▃▃▂
rehab_priority 2654 0.44 489.33 1658.81 0.00 7.00 91.50 376.25 29697.00 ▇▁▁▁▁
water_point_population 4 1.00 513.58 1458.92 0.00 14.00 119.00 433.25 29697.00 ▇▁▁▁▁
local_population_1km 4 1.00 2727.16 4189.46 0.00 176.00 1032.00 3717.00 36118.00 ▇▁▁▁▁
crucialness_score 798 0.83 0.26 0.28 0.00 0.07 0.15 0.35 1.00 ▇▃▁▁▁
pressure_score 798 0.83 1.46 4.16 0.00 0.12 0.41 1.24 93.69 ▇▁▁▁▁
usage_capacity 0 1.00 560.74 338.46 300.00 300.00 300.00 1000.00 1000.00 ▇▁▁▁▅
days_since_report 0 1.00 2692.69 41.92 1483.00 2688.00 2693.00 2700.00 4645.00 ▁▇▁▁▁
staleness_score 0 1.00 42.80 0.58 23.13 42.70 42.79 42.86 62.66 ▁▁▇▁▁
location_id 0 1.00 235865.49 6657.60 23741.00 230638.75 236199.50 240061.25 267454.00 ▁▁▁▁▇
cluster_size 0 1.00 1.05 0.25 1.00 1.00 1.00 1.00 4.00 ▇▁▁▁▁
lat_deg_original 4760 0.00 NaN NA NA NA NA NA NA
lon_deg_original 4760 0.00 NaN NA NA NA NA NA NA
count 0 1.00 1.00 0.00 1.00 1.00 1.00 1.00 1.00 ▁▁▇▁▁

3.2.3 Extract Functional Water Point

EDA

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(bdy_osun) +
  tm_polygons(alpha = 0.4) +
  
tm_shape(wp_osun) +
  tm_dots(col = "status_clean",
          alpha = 0.6) +
  tm_view(set.zoom.limits = c(8.5,12))
tmap_mode("plot")
tmap mode set to plotting
wp_osun.sf <- wp_osun %>%
  filter_at(vars(status,
                 distance_to_primary_road,
                 distance_to_secondary_road,
                 distance_to_tertiary_road,
                 distance_to_city,
                 distance_to_town,
                 water_point_population,
                 local_population_1km,
                 usage_capacity,
                 is_urban,
                 water_source_clean),
            all_vars(!is.na(.)))%>%
  mutate(usage_capacity = as.factor(usage_capacity))

4. CORRELATION ANALYSIS

4.1 Create Data Table for Correlation Matrix Analysis

4.1.1 Get Column Index for the Key Variable

match(c("distance_to_primary_road",
        "distance_to_secondary_road",
        "distance_to_tertiary_road",
        "distance_to_city",
        "distance_to_town",
        "water_point_population",
        "local_population_1km",
        "usage_capacity",
        "is_urban",
        "water_source_clean",
        "status",
        "geometry"),
      names(wp_osun.sf))
 [1] 35 36 37 38 39 42 43 46 47  7 57 NA

4.1.2 Create Data Table for Correlation Analysis

wp_osun.sf_clean <- wp_osun.sf %>%
  select(c(7, 35:39, 42:43, 46:47, 57)) %>%
  st_set_geometry(NULL)

4.2 Visualise Correlation Matrix

cluster_vars.cor = cor(wp_osun.sf_clean[,2:7])

corrplot.mixed(cluster_vars.cor,
              lower = "ellipse",
              upper = "number",
              tl.pos = "lt",
              diag = "l",
              tl.col = "black")

model <- glm(status ~
               distance_to_primary_road +
               distance_to_secondary_road +
               distance_to_tertiary_road +
               distance_to_city +
               distance_to_town +
               is_urban +
               usage_capacity +
               water_source_clean +
               water_point_population +
               local_population_1km,
             data = wp_osun.sf,
             family = binomial(link = 'logit'))

Create Model Overview

blr_regress(model)
                             Model Overview                              
------------------------------------------------------------------------
Data Set    Resp Var    Obs.    Df. Model    Df. Residual    Convergence 
------------------------------------------------------------------------
  data       status     4756      4755           4744           TRUE     
------------------------------------------------------------------------

                    Response Summary                     
--------------------------------------------------------
Outcome        Frequency        Outcome        Frequency 
--------------------------------------------------------
   0             2114              1             2642    
--------------------------------------------------------

                                 Maximum Likelihood Estimates                                   
-----------------------------------------------------------------------------------------------
               Parameter                    DF    Estimate    Std. Error    z value     Pr(>|z|) 
-----------------------------------------------------------------------------------------------
              (Intercept)                   1      0.3887        0.1124      3.4588       5e-04 
        distance_to_primary_road            1      0.0000        0.0000     -0.7153      0.4744 
       distance_to_secondary_road           1      0.0000        0.0000     -0.5530      0.5802 
       distance_to_tertiary_road            1      1e-04         0.0000      4.6708      0.0000 
            distance_to_city                1      0.0000        0.0000     -4.7574      0.0000 
            distance_to_town                1      0.0000        0.0000     -4.9170      0.0000 
              is_urbanTRUE                  1     -0.2971        0.0819     -3.6294       3e-04 
           usage_capacity1000               1     -0.6230        0.0697     -8.9366      0.0000 
water_source_cleanProtected Shallow Well    1      0.5040        0.0857      5.8783      0.0000 
   water_source_cleanProtected Spring       1      1.2882        0.4388      2.9359      0.0033 
         water_point_population             1      -5e-04        0.0000    -11.3686      0.0000 
          local_population_1km              1      3e-04         0.0000     19.2953      0.0000 
-----------------------------------------------------------------------------------------------

 Association of Predicted Probabilities and Observed Responses  
---------------------------------------------------------------
% Concordant          0.7347          Somers' D        0.4693   
% Discordant          0.2653          Gamma            0.4693   
% Tied                0.0000          Tau-a            0.2318   
Pairs                5585188          c                0.7347   
---------------------------------------------------------------

create response profile

extract maximum likelihood estimates.

estimate concordant

blr_confusion_matrix(model, cutoff = 0.5)
Confusion Matrix and Statistics 

          Reference
Prediction FALSE TRUE
         0  1301  738
         1   813 1904

                Accuracy : 0.6739 
     No Information Rate : 0.4445 

                   Kappa : 0.3373 

McNemars's Test P-Value  : 0.0602 

             Sensitivity : 0.7207 
             Specificity : 0.6154 
          Pos Pred Value : 0.7008 
          Neg Pred Value : 0.6381 
              Prevalence : 0.5555 
          Detection Rate : 0.4003 
    Detection Prevalence : 0.5713 
       Balanced Accuracy : 0.6680 
               Precision : 0.7008 
                  Recall : 0.7207 

        'Positive' Class : 1

Build Geographical Weight

spatial point data frame

wp_osun_sf_clean is the cleaned version without missing value.

wp_osun.sp <- wp_osun.sf %>%
  select(c(status,
           distance_to_primary_road,
           distance_to_secondary_road,
           distance_to_tertiary_road,
           distance_to_city,
           distance_to_town,
           is_urban,
           usage_capacity,
           water_source_clean,
           water_point_population,
           local_population_1km)) %>%
  as_Spatial()

wp_osun.sp
class       : SpatialPointsDataFrame 
features    : 4756 
extent      : 182502.4, 290751, 340054.1, 450905.3  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs 
variables   : 11
names       : status, distance_to_primary_road, distance_to_secondary_road, distance_to_tertiary_road, distance_to_city, distance_to_town, is_urban, usage_capacity, water_source_clean, water_point_population, local_population_1km 
min values  :      0,        0.014461356813335,          0.152195902540837,         0.017815121653488, 53.0461399623541, 30.0019777713073,        0,           1000,           Borehole,                      0,                    0 
max values  :      1,         26909.8616132094,           19559.4793799085,          10966.2705628969,  47934.343603562, 44020.6393368124,        1,            300,   Protected Spring,                  29697,                36118 
bw.fixed <- bw.ggwr(status ~ distance_to_primary_road +
                      distance_to_secondary_road +
                      distance_to_tertiary_road +
                      distance_to_city +
                      distance_to_town +
                      is_urban +
                      usage_capacity +
                      water_source_clean +
                      water_point_population +
                      local_population_1km,
                    data = wp_osun.sp,
                    family = "binomial",
                    approach = "AIC",
                    kernel = "gaussian",
                    adaptive = FALSE,
                    longlat = FALSE)
bw.fixed
gwlr.fixed <- ggwr.basic(status ~
                           distance_to_primary_road + 
                           distance_to_secondary_road +
                           distance_to_tertiary_road +
                           distance_to_city +
                           distance_to_town +
                           is_urban +
                           usage_capacity +
                           water_source_clean +
                           water_point_population +
                           local_population_1km,
                    data = wp_osun.sp,
                    bw = 2599.672,
                    family = "binomial",
                    kernel = "gaussian",
                    adaptive = FALSE,
                    longlat = FALSE)
 Iteration    Log-Likelihood
=========================
       0        -1958 
       1        -1676 
       2        -1526 
       3        -1443 
       4        -1405 
       5        -1405 
gwr.fixed <- as.data.frame(gwlr.fixed$SDF)

Set the yhat (threshold value) to 0.5 into 1 and else 0. The result of the logic comparison operation will be saved into a field called most.

gwr.fixed <- gwr.fixed %>% 
  mutate(most = ifelse(
    gwr.fixed$yhat >= 0.5, T, F
  ))
freq(gwr.fixed$most)
         n    % val%
FALSE 2087 43.9 43.9
TRUE  2669 56.1 56.1
gwr.fixed$y <- as.factor(gwr.fixed$y)
gwr.fixed$most <- as.factor(gwr.fixed$most)
CM <- confusionMatrix(data = gwr.fixed$most, reference = gwr.fixed$y)
CM
Confusion Matrix and Statistics

          Reference
Prediction FALSE TRUE
     FALSE  1824  263
     TRUE    290 2379
                                          
               Accuracy : 0.8837          
                 95% CI : (0.8743, 0.8927)
    No Information Rate : 0.5555          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.7642          
                                          
 Mcnemar's Test P-Value : 0.2689          
                                          
            Sensitivity : 0.8628          
            Specificity : 0.9005          
         Pos Pred Value : 0.8740          
         Neg Pred Value : 0.8913          
             Prevalence : 0.4445          
         Detection Rate : 0.3835          
   Detection Prevalence : 0.4388          
      Balanced Accuracy : 0.8816          
                                          
       'Positive' Class : FALSE           
                                          
wp_osun.sf_selected <- wp_osun.sf %>% 
  select(c(ADM2_EN, ADM2_PCODE, ADM1_EN, ADM1_PCODE, status))
gwr_sf.fixed <- cbind(wp_osun.sf_selected, gwr.fixed)

Visualise Coefficient Estimates

Create interactive point symbol map.

tmap_mode("view")
tmap mode set to interactive viewing
prob_T <- tm_shape(bdy_osun) +
  tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf.fixed) +
  tm_dots(col = "yhat",
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(8,14))
prob_T
tmap_mode("plot")
tmap mode set to plotting
tmap_mode("view")
tmap mode set to interactive viewing
tertiary_TV <- tm_shape(bdy_osun) + 
  tm_polygons(alpha = 0.1) + 
  tm_shape(gwr_sf.fixed) + 
  tm_dots(col = "distance_to_tertiary_road_TV", 
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(8,14))
tmap_mode("plot")
tmap mode set to plotting